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1  Introduction 


In  this  paper,  we  introduce  a  new  version  of  the  Fast  Multipole  Method  (FMM)  for  the  evaluation 
of  potential  fields  in  three  dimensions.  The  scheme  evaluates  all  pairwise  interactions  in  large 
ensembles  of  particles,  i.e.  expressions  of  the  form 


^(®i) 


E 

iszl 


(1) 


for  the  gravitational  or  electrostatic  potential  and 


n 


B(*i)  =  E«- 

f=l 


(2) 


for  the  field,  where  xi,  X2,  •  •  • ,  Xn  are  points  in  2?^,  and  ,  92,  *  •  • ,  9n  are  a  set  of  (real)  coefficients. 

The  evaluation  of  expressions  of  the  form  (1)  is  closely  related  to  a  number  of  important 
problems  in  applied  mathematics,  physics,  chemistry,  and  biology.  Molecular  dynamics  and 
Hartree-Fock  calculations  in  chemistry,  the  evolution  of  large-scale  gravitational  systems  in  as¬ 
trophysics,  capacitance  extraction  in  electrical  engineering,  and  vortex  methods  in  fluid  dynamics 
are  all  examples  of  areas  where  simulations  require  rapid  and  accurate  evaluation  of  sums  of  the 
form  (1)  and  (2).  When  certain  closely  related  interactions  are  considered  as  well,  involving 
expressions  of  the  form 

^(®i)  =  ^  ft  •  -ITT — nr .  (3) 


«=i 

•*3 


\\Xj  -  Xi| 


the  list  of  applications  becomes  even  more  extensive. 

This  paper  is  a  continuation  (after  an  interval  of  several  years)  of  a  sequence  of  joint  papers 
by  the  authors,  starting  with  (Greengard  and  Rokhlin  1987)  and  (Carrier  et  al.  1988)  which 
introduced  the  Fast  Multipole  Method  in  two  dimensions.  Subsequent  work  extended  the  method 
to  three  dimensions  (Greengard  1988;  Greengard  and  Rokhlin  1988a,b),  and  there  followed  a 
number  of  versions  of  the  scheme,  both  by  the  present  authors  and  by  other  researchers  (see,  for 
example,  Anderson  1992;  Nabors  et  al.  1994;  Berman  1995;  Epton  and  Dembart  1995;  Elliott 
and  Board  1996).  After  about  ten  years  of  research,  however,  a  somewhat  unsatisfactory  picture 
has  emerged.  In  short,  there  now  exist  extremely  efficient  algorithms  for  the  evaluation  of  the 
two-dimensional  analogues  of  (1),  (2)  with  (practically)  arbitrarily  high  precision,  as  well  as 
very  efficient  and  accurate  algorithms  for  a  host  of  related  problems  (Rokhlin  1988;  Alpert  and 
Rokhlin  1991;  Beylkin  et  al.  1991;  Coifman  and  Meyer  1991;  Greengard  and  Strain  1991;  Strain 
1991;  Alpert  et  al.  1993).  However,  for  the  sums  (1)  and  (2),  there  are  few  practical  schemes, 
and  these  provide  only  limited  accuracy.  Since  most  real-world  problems  are  three-dimensional, 
it  can  be  said  that  analysis-based  “fast”  methods  are  a  promising  group  of  techniques,  but  that 
they  have  not  yet  lived  up  to  all  their  expectations. 

In  the  present  paper,  we  try  to  remedy  this  situation.  We  describe  a  version  of  the  Fast  Mul¬ 
tipole  Method  in  three  dimensions  that  produces  high  accuracy  at  an  acceptable  computational 
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cost.  As  will  be  seen  from  the  numerical  examples  in  Section  9,  the  new  scheme  has  a  break¬ 
even  point  of  n  ~  2000  when  compared  with  direct  calculation  in  single  precision;  with  10-digit 
accuracy,  the  break-even  point  is  n  ~  5000;  with  3-digit  accuracy,  it  is  n  ~  500.  The  approach 
uses  a  considerably  more  involved  mathematical  (and  numerical)  apparatus  than  is  customary 
in  the  design  of  fast  multipole-type  algorithms.  This  apparatus  is  based  on  a  new  diagonal  form 
for  translation  operators  acting  on  harmonic  functions,  extending  the  two-dimensional  version 
introduced  in  (Hrycak  and  Rokhlin  1995).  The  overall  approach  bears  some  resemblance  to 
that  used  in  fast  multipole  methods  for  high  frequency  scattering  problems,  which  are  based  on 
diagonal  forms  of  translation  operators  for  the  Helmholtz  equation  (Rokhlin  1990,  1995;  Epton 
and  Dembart  1995). 


2  Philosophical  Preliminaries 

We  begin  with  an  overview  of  analysis  based  “fast”  numerical  algorithms,  concentrating  on  the 
evaluation  of  expressions  of  the  form  (1).  Where  possible,  we  summarize  the  current  “state  of 
the  art”  in  the  field. 

If  we  define  the  n  x  n-matrix  A  by  the  formula 


we  can  rewrite  (1)  in  the  form 

$  =  Aq,  (5) 

with  (the  expression  (2)  can  be  rewritten  in  a  similar  fashion).  Obviously,  straight¬ 

forward  evaluation  of  either  of  the  expressions  (1),  (2)  requires  0{n?)  operations  (evaluating  n 
potentials  at  n  points),  and  for  large-scale  problems  this  estimate  is  prohibitively  large.  On  the 
other  hand,  the  evaluation  of  expressions  of  the  forms  (1),  (2)  is  an  integral  part  of  the  numerical 
solution  of  many  important  problems  in  applied  mathematics,  and  during  the  last  decade,  several 
“fast”  schemes  have  been  proposed  for  this  purpose,  i.e.  schemes  whose  computational  cost  is 
less  than  O(n^).  Typically,  such  methods  require  0{n)  or  0(n  -  log  n)  operations  (Rokhlin  1985; 
Greengard  and  Rokhlin  1987;  Carrier  et  al.  1988;  Rokhlin  1988, 1990;  Alpert  and  Rokhlin  1991; 
Beylkin  et  al.  1991;  Coifman  and  Meyer  1991;  Greengard  and  Strain  1991;  Strain  1991,  1992; 
Epton  and  Dembart  1995).  All  of  them  are  based  on  the  straightforward  observation  that  the 
potentials  are  smooth  functions  in  except  when  Xi  is  near  xj,  and  as  a  result,  large  subma¬ 
trices  of  A  are  of  low  rank  (to  any  finite  precision).  Clearly,  applying  a  matrix  of  dimension 
n  X  n  but  rank  J  to  an  arbitrary  vector  requires  only  n  •  J  operations  (as  opposed  to  «^);  this 
simple  observation  leads  directly  to  a  variety  of  asymptotically  “fast”  schemes  for  the  evaluation 
of  (1);  below,  we  illustrate  the  construction  of  such  schemes  with  a  simple  example. 

Suppose  that,  in  the  expression  (1),  the  points  ®i, ®2t '  *  •  >  ®n  are  equispaced  and  lie  on  the 
interval  [—1, 1],  so  that 

Xi  =  — 1,  X2  =  ~1  -f-  h,  •  •  • ,  1  —  1  —  h,  Xfi  —  1,  (6) 
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Figure  1:  Subdivision  of  matrix  into  well-separated  blocks.  The  submatrices  marked  by  an  X 
are  not  well-separated  from  the  diagonal. 

where  h  =  2/(n  —  1).  Given  three  integers  I,  m,  k  such  that 

1  <  l<n, 

1  <  m  <  n, 

1  <  k  <  n  —  I,  (7) 

1  <  k  <  n  —  m, 

we  will  denote  by  Ai^m,k  the  submatrix  of  A  consisting  of  such  elements  A,j  that 

I  <  i  <  I  +  k  —  1, 


m  <  j  <  Tn  +  k  —  1, 

(8) 

and  say  that  Ai^rn,k  is  separated  from  the  diagonal  if 

1  /  —  {m-h  A:  -  1)  |>  fc. 

(9) 

and 

1  m  —  (I  -1-  fc  —  1)  |>  A;. 

(10) 

In  other  words,  we  will  say  that  the  submatrix  Ai^rn,k  of  the  matrix  A  is  separated  from  the 
diagonal  if  its  distance  from  the  diagonal  of  A  is  greater  than  or  equal  to  its  own  size  (Fig. 
1).  We  will  construct  a  rudimentary  “fast”  algorithm  for  the  application  of  the  matrix  A  to  an 
arbitrary  vector  by  means  of  the  following  lemma;  its  proof  is  based  on  several  well-known  facts, 
all  of  which  can  be  found  in  (Dahlquist  and  Bjork  1974). 
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Lemma  2.1  For  any  integer  p  <  1,  and  any  l,m,k  satisfying  the  conditions  (7),  there  exists  a 
matrix  Bi^rn,k  of  dimension  k  X  k  and  rank  J ,  such  that 

||^/,m,fc  “  Bl,m,k\\  — 

In  other  words,  any  submatrix  of  A  separated  from  the  diagonal  is  of  rank  J,  to  the  precision 
1/4-^. 

Outline  of  proof . 

We  start  by  defining  the  function  /  ;  by  the  formula 

and  observing  that  /  is  smooth  everywhere  in  R"^,  except  when  x  =  y.  We  will  say  that  the 
square  [a,  a  +  c]  x  [6, 6  +  c]  C  is  separated  from  the  the  diagonal  if 

I  a  +  c  —  6  |>  c,  (13) 


CLllU. 

\b  +  c  —  a\>c,  (14) 

and  observe  that  on  any  such  square,  the  function  /  can  be  expanded  in  a  two-dimensional 
Chebychev  series,  i.e.  represented  in  the  form 

„  s  m/2-a:  c-l-2-a^  c  +  2-b  ,  . 

/(®,  y)=  •  Tpi— - - — )  •  ^?(”7  Z 

p,q=0 

with  Tj  denoting  the  j-th  Chebychev  polynomial.  Finally,  we  observe  that  for  any  a,b,c  satisfying 
the  conditions  (13),  (14),  the  convergence  of  the  expansion  (15)  is  given  by  the  formula 

,  2 * X  c “t* 2*flv  m/2*2/  c 4* 2*5^i  1  /i/j\ 

\f{x,y)-  - Z — 

p,q=0 

In  other  words,  for  any  square  separated  from  the  diagonal,  the  expansion  (15)  converges  to 
accuracy  s  after  no  more  than  log4[e)  terms.  Combining  (9),  (10)  and  (1)  with  (15)  and  (16), 
we  observe  that  for  any  i,j  satisfying  the  inequalities  (8), 

with  a  =  (2  •/)/»-  1,  6  =  (2  •  m)/n  -  1,  c  =  (2  •  k)/n.  The  matrix  Bi,m,k  defined  by 

V  ^  c-l-2-a^  rrf^-Vi  ns'l 

{Bl,m,k)ij  —  Y  ^P9'^P^  c  C  C  C 

p,q=0 
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clearly  satisfies  the  desired  condition  (11), 

□ 

In  order  to  develop  a  fast  algorithm,  we  first  subdivide  the  matrix  A  into  a  collection  of 
submatrices,  as  depicted  in  Fig.  1.  Each  of  the  submatrices  in  this  structure  is  separated  from 
the  diagonal,  except  the  submatrices  near  the  diagonal  whose  ranks  are  small  simply  because 
their  dimensionality  is  small.  By  virtue  of  Lemma  2.1,  each  of  the  separated  submatrices  is 
of  rank  J,  to  the  accuracy  4"'^.  In  order  to  apply  A  to  an  arbitrary  vector  with  fixed  but 
finite  accuracy  (which  is  always  the  case  in  numerical  computations),  we  can  apply  each  of  the 
submatrices  to  the  appropriate  part  of  the  vector  for  a  cost  proportional  to  k  •  J,  where  k  is  the 
size  of  the  submatrix.  Adding  up  the  costs  for  all  such  submatrices,  we  obtain  the  operation 
count  of 


J  •  71  •  log(n)  log(-)  •  n  •  log(n), 


(19) 


instead  of 

The  scheme  outlined  above  is  extremely  simple,  but  representative  of  the  current  approach 
to  the  design  of  ‘^ast”  summation  algorithms.  Several  comments  are  in  order. 

1.  It  is  easy  to  see  that  the  matrix  A  defined  in  (4)  with  the  spacing  defined  by  (6)  is  in  fact  a 
Toeplitz  matrix  that  can  be  applied  to  an  arbitrary  vector  for  a  cost  proportional  to  n'log(n)  via 
the  Fast  Fourier  Transform.  This  situation  occurs  sometimes,  both  in  one  and  higher  dimensions. 
However,  the  Toeplitz  nature  of  the  matrix  A  is  lost  when  the  points  are  not  distributed  on  a 
uniform  grid,  and  direct  application  of  the  FFT  becomes  impossible.  For  “somewhat  uniformly” 
distributed  points  Xj,  various  types  of  local  corrections  have  been  successfully  utilized.  When  the 
points  are  not  distributed  uniformly  (for  example,  on  a  curve  or  surface),  FFT-based  methods 
become  ineffective. 


2.  As  described,  the  scheme  is  only  applicable  to  one-dimensional  problems,  and  under  very 
limited  conditions.  In  most  situations,  the  subdivision  of  the  matrix  has  to  be  modified,  taking 
into  account  the  geometric  distribution  of  points  in  order  to  locate  submatrices  whose  “numerical 
rank”  is  low.  Examples  of  such  subdivisions  can  be  found  in  (Carrier  et  al,  1988;  Van  Dommelen 
and  Rundensteiner  1989;  Beylkin  et  aL  1991;  Nabors  et  al,  1994) 

3.  The  scheme  is  extremely  simple  and  general.  It  is  entirely  unrelated  to  the  detailed  nature  of 
the  matrix  A,  needing  only  some  inequality  like  (16).  In  other  words,  so  long  as  the  entries  of 
the  matrix  A  are  smooth  functions  of  their  indices  away  from  the  diagonal,  a  scheme  of  the  type 
outline  above  will  work.  In  fact,  even  that  is  not  necessary;  the  elements  of  the  matrix  have  only 
to  be  sufficiently  smooth  functions  of  their  indices  on  a  sufficiently  large  part  of  the  matrix, 

4.  The  scheme  admits  a  large  number  of  modifications;  the  most  obvious  ones  replace  the  Cheby- 
chev  expansion  in  (15)  with  other  approximations;  one  should  be  careful  in  doing  so,  since  under 
many  conditions  the  Chebychev  approximation  is  optimal  (among  polynomial  approximations), 
or  nearly  so.  Some  of  the  special-purpose  approximation  schemes  which  have  been  used  success¬ 
fully  employ  wavelets  and  related  bases  (Beylkin  et  al,  1991;  Alpert  et  al,  1993). 

Another  obvious  modification  is  a  change  in  the  choice  of  submatrices  of  low  rank;  the  use  of 
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rectangular  subniatrices  (as  opposed  to  the  square  ones  in  Fig.  1)  permits  coarser  subdivisions 
and  tends  to  result  in  more  efficient  algorithms. 

5.  Algorithms  of  the  type  described  above  usually  do  not  work  for  problems  where  the  matrix 
A  is  a  discretization  of  an  integral  operator  with  an  oscillatory  kernel,  since  such  discretizations 
(normally)  have  a  more  or  less  constant  number  of  nodes  per  wavelength  of  the  dominant  os¬ 
cillation.  As  a  result,  the  rank  of  each  submatrix  is  proportional  to  its  size,  and  the  resulting 
algorithms  have  CPU  time  estimates  of  the  order  O(n^).  Sometimes,  the  calculation  can  be  accel¬ 
erated  by  reducing  the  size  of  the  constant  (Wagner  et  al.  1994),  but  the  asymptotic  complexity 
in  such  cases  is  the  same  as  for  the  direct  approach.  For  certain  classes  of  oscillatory  problems 
(such  as  Helmholtz  and  Schrodinger  equations  at  high  frequency),  there  exist  asymptotically 
“fast”  schemes  based  on  a  different  (and  considerably  more  involved)  analytical  apparatus  (see, 
for  example,  Rokhlin  1988,  1990,  1993;  Canning  1989,  1992,  1993;  Coffman  and  Meyer  1991; 
Bradie  et  al  1993;  Coffman  et  al.  1993,  1994;  Wagner  and  Chew  1994;  Epton  and  Dembart 
1995).  As  noted  in  the  introduction,  these  schemes  are  related  to  the  scheme  we  will  present 
below.  They  are,  however,  outside  the  scope  of  this  paper. 


3  Mathematical  Preliminaries  I 


In  this  section,  we  briefly  derive  the  multipole  expansion  of  a  charge  distribution  and  refer 
the  reader  to  Kellogg  (1953),  Jackson  (1975),  Wallace  (1984),  and  Greengard  (1988)  for  more 
detailed  discussions. 

If  a  point  charge  of  strength  q  is  located  at  Pq  =  (xq,  yo,  «o),  then  the  potential  and  electro¬ 
static  field  due  to  this  charge  at  a  distinct  point  P  =  (x,  y,  z)  are  given  by 

#  =  1  (20) 

K 


and 


E  =  -V#=(-^, 


y-yo  z-zq. 

R3  ’  R3 


(21) 


respectively,  where  R  denotes  the  distance  between  points  Pq  and  P . 

We  would  like  to  derive  a  series  expansion  for  the  potential  at  P  in  terms  of  its  distance 
from  the  origin  r.  For  this,  let  the  spherical  coordinates  of  P  be  (r,  0, 4>)  and  of  Pq  be  (p,  o,  P). 
Letting  7  be  the  angle  between  the  vectors  P  and  Pq,  we  have  from  the  law  of  cosines 


H-  -  2rpcos7, 


(22) 


with 


cos  7  =  cos  ^  cos  a  -f  sin  5  sin  a  cos(<^  —  /?) . 


Thus, 


1 _ 1 _ _ 1 

^  ~  rv^l-2fcos7  +  $  ~  +  ’ 


(23) 

(24) 
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having  set 


-  and  u  =  cos  7  . 
r 


(25) 


For  //  <  1,  we  may  expand  the 
form 


inverse  square  root  in  powers  of  fj,,  resulting  in  a  series  of  the 


1 

y/l  —  2ufl  + 


71=0 


(26) 


where  „ 

Po{u)  =  l,  Pi{u)  =  u,  P2{u)  =  -{u‘^ (27) 


and,  in  general,  F„(m)  is  the  Legendre  polynomial  of  degree  n.  Our  expression  for  the  field  now 
takes  the  form 

-t  00  _T). 

(28) 


1 

p  =  1] 


R 


n=s0 


The  angular  parameter  w,  however,  depends  on  both  the  source  and  the  target  locations. 
A  more  general  representation  will  require  the  introduction  of  spherical  harmonics,  which  are 
solutions  of  the  Laplace  equation  obtained  by  separation  of  variables  in  spherical  coordinates. 
Any  harmonic  function  $  can  be  expanded  in  the  form 


00  n  ✓  \ 

*=E  E  (29) 

71=0  7n=— 71  ^ 

The  terms  Y^{d^  (j>)r'^  are  referred  to  as  spherical  harmonics  of  degree  n  or  solid  harmonics^ 
the  terms  (f>)/r'^'^^  are  called  spherical  harmonics  of  degree  — n  —  1  or  multipoles^  and  the 

coefficients  and  are  known  as  the  moments  of  the  expansion. 

The  spherical  harmonics  can  be  expressed  in  terms  of  partial  derivatives  of  l/r  (Wallace 


1984)  as 

_  ,0  s"  , 

rn+l  -  ' 

(30) 

For  m  >  0,  we  have 

rn+1  '■ax*  dy'  ' 

r  Qy-m  .y 

VrJ  ’ 

(31) 

and 

(32) 

where 

(-1)” 

jm  _  V  2.; 

(33) 

”  V^(n  —  m)!  •  (n  +  m)! 

They  also  satisfy  the  relation 


where  we  have  omitted  the  normalization  factor  of  i/(2n  +  l)/47r,  to  match  the  definitions  (30) 
-  (32)  given  above.  The  special  functions  are  called  associated  Legendre  functions  and  can 
be  defined  by  the  Rodrigues’  formula 

jm 

Theorem  3.1  (Addition  Theorem  for  Legendre  Polynomials)  Let  P  and  Q  be  points  with 
spherical  coordinates  (r,  9,  <f>)  and  {p,  a,p),  respectively,  and  let  j  be  the  angle  subtended  between 
them.  Then 

71 

(35) 


F»(cos7)=  E 

Combining  Theorem  3.1  and  eq.  (28),  we  have 


,,n+l 


(36) 


71=0  m=—n 


It  is  now  straightforward  to  expand  the  field  due  to  a  collection  of  sources  in  terms  of 
multi  poles. 

Theorem  3.2  (Multipole  Expansion).  Suppose  that  k  charges  of 

strengths  {g,-,  i  =  1, ...,  k}  are  located  at  the  points  {Qi  =  {pi,  a,-,  0i),  i  =  1, ...,  k},  with  |/),|  <  a. 
Then  for  any  P  =  (r,  9,  with  r  >  a,  the  potential  ^(P)  is  given  by 


OO  n 

*(p)  =  E  E 


71=0  7n=— n 


where 


Furthermore,  for  any  p  >  1, 


1=1 


P  n  J^m 

w-E  E 


n=0  m=-n 


A 

r  —  a  \rj 


where 


(37) 


(38) 


(39) 


(40) 


t  =  l 


Proof:  The  formula  (38)  follows  from  eq.  (36)  and  superposition.  The  error  bound  is  obtained 
from  the  triangle  inequality  and  the  fact  that  the  ratios  pi/r  are  bounded  from  above  by  a/r.  □ 
Suppose  now  that  r  =  2a  in  the  context  of  the  preceding  theorem.  Then  the  error  bound 
(39)  becomes 


W-E  E 

n=0  m=— 71 

and  setting  p  =  log2(l/£)  yields  a  precision  s  relative  to  the  ratio  A/ a. 


(41) 
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4  An  A  log  A  Algorithm 

Theorem  3.2  is  all  that  is  required  to  construct  a  simple  fast  algorithm  of  arbitrary  precision. 
To  reduce  the  number  of  issues  addressed,  we  assume  that  the  particles  are  fairly  homogeneously 
distributed  in  a  square  so  that  adaptive  refinement  is  not  required. 

In  order  to  make  systematic  use  of  multipole  expansions,  we  introduce  a  hierarchy  of  boxes 
which  refine  the  computational  domain  into  smaller  and  smaller  regions.  At  refinement  level  0, 
we  have  the  entire  computational  domain.  Refinement  level  i  +  1  is  obtained  recursively  from 
level  I  by  subdivision  of  each  box  into  eight  equal  parts.  This  yields  a  natural  tree  structure, 
where  the  eight  boxes  at  level  I  + 1  obtained  by  subdivision  of  a  box  at  level  I  are  considered  its 
children. 

Definition  4.1  Two  boxes  are  said  to  be  near  neighbors  if  they  are  at  the  same  refinement 
level  and  share  a  boundary  point  (a  box  is  a  near  neighbor  of  itself). 

Definition  4.2  Two  boxes  are  said  to  be  well  separated  if  they  are  at  the  same  refinement 
level  and  are  not  near  neighbors. 

Definition  4.3  With  each  box  i  is  associated  an  interaction  list,  consisting  of  the  children  of 
the  near  neighbors  of  i’s  parent  which  are  well  separated  from  box  i  (Fig.  4)‘ 

Definition  4.4  With  each  box  i  at  level  I  is  associated  a  multipole  expansion  $/,,•  about  the  box 
center,  which  describes  the  far  field  induced  by  the  particles  contained  inside  the  box. 

The  basic  idea  is  to  consider  clusters  of  particles  at  successive  levels  of  spatial  refinement, 
and  to  compute  interactions  between  distant  clusters  by  means  of  multipole  expansions  when 
possible.  It  is  clear  that  at  levels  0  and  1,  there  are  no  pairs  of  boxes  which  are  well  separated. 
At  level  2,  on  the  other  hand,  sixty-four  boxes  have  been  created  and  there  are  a  number  of  well 
separated  pairs.  Multipole  expansions  can  then  be  used  to  compute  interactions  between  these 
well  separated  pairs  (Fig.  2)  with  rigorous  bounds  on  the  error.  In  fact,  it  is  easy  to  see  that 
the  bound  (39)  applies  with  the  ratio  a/r  <  Thus,  to  achieve  a  given  precision  £,  we  need 

to  use  p  =  iogy5(l/e)  terms. 

It  remains  to  compute  the  interactions  between  particles  contained  in  each  box  with  those 
contained  in  the  box’s  near  neighbors,  and  this  is  done  recursively.  We  first  refine  each  level 

2  box  to  create  level  3.  For  a  given  level  3  box,  we  then  seek  to  determine  which  other  level 

3  boxes  can  be  interacted  with  by  means  of  multipole  expansions.  Since  those  boxes  outside 
the  region  of  the  parent's  nearest  neighbors  are  already  accounted  for  (at  level  2),  they  can  be 
ignored.  Since  interactions  with  near  neighbors  cannot  be  accounted  for  accurately  by  means  of 
an  expansion,  they  can  also  be  ignored  for  the  moment.  The  remaining  boxes  correspond  exactly 
to  the  interaction  list  defined  above  (Fig.  3). 

The  nature  of  the  recursion  is  now  clear.  At  every  level,  the  multipole  expansion  is  formed 
for  each  box  due  to  the  particles  it  contains.  The  resulting  expansion  is  then  evaluated  for  each 
particle  in  the  region  covered  by  its  interaction  list  (Fig.  4) . 
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Figure  2:  The  first  step  of  the  algorithm,  depicted  in  two  space  dimensions  for  clarity.  Interac¬ 
tions  between  particles  in  box  X  and  its  near  neighbors  (grey)  are  not  computed.  Interactions 
between  well  separated  boxes  are  computed  via  multipole  expansions. 


Figure  3:  The  second  step  of  the  algorithm,  depicted  in  two  space  dimensions.  After  refinement, 
note  that  the  particles  in  the  box  marked  X  have  already  interacted  with  the  most  distant  particles 
(light  grey).  They  are  now  well  separated  from  the  particles  in  the  white  boxes,  so  that  these 
interactions  can  be  computed  via  multipole  expansions.  The  near  neighbor  interactions  (dark 
grey)  are  not  computed. 
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Figure  4:  Subsequent  steps  of  the  algorithm.  The  interaction  list  for  box  X  is  indicated  in  white. 
In  three  dimensions,  it  contains  up  to  189  boxes. 


Figure  5:  At  the  finest  level,  interactions  with  near  neighbors  are  computed  directly.  In  three 
dimensions,  there  are  up  to  27  near  neighbors. 


We  halt  the  recursive  process  after  roughly  logg  N  levels  of  refinement.  The  amount  of  work 
done  at  each  level  is  of  the  order  0{N).  To  see  this,  note  first  that  approximately  iVp^  operations 
are  needed  to  create  all  expansions,  since  each  particle  contributes  to  expansion  coefficients. 
Secondly,  from  the  point  of  view  of  a  single  particle,  there  are  at  most  189  boxes  (the  maximum 
size  of  the  interaction  list)  whose  expansions  are  computed,  so  that  189  operations  are 
needed  for  all  evaluations. 

At  the  finest  level,  we  have  created  roughly  8'®**^  =  N  boxes  and  it  remains  only  to 
compute  interactions  between  nearest  neighbors.  By  the  assumption  of  homogeneity,  there  are 
0(1)  particles  per  box,  so  that  this  last  step  requires  about  27  N  operations  (Fig.  5).  The  total 
cost  is  approximately 

189iVp2  logs  N  +  27  N.  (42) 

The  algorithm  just  described  is,  in  essence,  a  nonadaptive  version  of  the  one  proposed  by 
Barnes  and  Hut  (1986),  except  that  it  achieves  arbitrary  precision  through  the  use  of  high 
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order  expansions.  Two-dimensional  schemes  of  this  type  are  due  to  Van  Dommelen  and  Run- 
densteiner  (1989)  and  Odlyzko  and  Schonhage  (1988).  Unfortunately,  while  such  schemes  have 
good  asymptotic  work  estimates,  the  three-dimensional  versions  provide  only  modest  speedups 
at  high  precision  for  the  values  of  N  encountered  in  present  day  applications.  At  N  =  100, 000, 
for  example,  seven  digits  of  accuracy  require  p  «  20,  and  the  N  log  N  scheme  is  only  two  to  three 
times  faster  than  the  direct  O(V^)  method.  In  order  to  significantly  accelerate  the  calculation, 
we  need  some  further  analytic  machinery. 


5  Mathematical  Preliminaries  II 


The  FMM  relies  on  three  translation  operators,  acting  on  either  multipole  (far  field)  or  solid 
harmonic  (local)  expansions.  They  are  described  in  the  next  three  theorems  (Greengard  and 
Rokhlin  1988a;  Greengard  1988). 


Theorem  5.1  (Translation  of  a  Multipole  Expansion)  Suppose  that  I  charges  of  strengths 
qi,q2,'--,gi  are  located  inside  the  sphere  D  of  radius  a  with  center  at  Q  =  {p,a,P),  and  that 
for  points  P  -  (r,  0,  (j>)  outside  D,  the  potential  due  to  these  charges  is  given  by  the  multipole 
expansion 

oo  n  /ytn 

*(P)  =  E  E  («) 

71=0  m=— n 

where  P  -Q  =  {r',0',4>').  Then  for  any  point  P  =  ir,0,<i>)  outside  the  sphere  Di  of  radius 
{a  +  p), 

OO  J 


i=o  fc=~i 


where 


j 

=  E  E 


7X=0  771=— n 


p  J  M* 

♦(^’)-E  E  ■;iii 

jrrO  j 


(44) 

(45) 

»>  U 

(46) 

r-(a-|-/>)i\  r  J 

Definition  5.1  The  linear  operator  mapping  old  multipole  coefficients  {0|},  j  =  0,...,p  and 
k  =  —j,  to  new  multipole  coefficients  {Mj},  j  =  0, . .  .,p  and  k  =  —j,  • .  -  li,  according  to 

eq.  (45)  will  be  denoted  by  Tmm- 

Theorem  5.2  (Conversion  of  a  Multipole  Expansion  into  a  Local  Expansion)  Suppose 
that  I  charges  of  strengths  9i,  92?  *  *  •  >  9/  are  located  inside  the  sphere  Dq  of  radius  a  with  center 
atQ  =  {p,  a,  13),  and  that  p  >  (c-h  l)o  with  c>  1.  Then  the  corresponding  multipole  expansion 
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(43)  converges  inside  the  sphere  Do  of  radius  a  centered  at  the  origin.  Inside  Doi  the  potential 
due  to  the  charges  gi,  92^  •  *  9/  described  by  a  local  expansion: 


00  j 


where 


^iP)  =  E  E 

3=0 k=-j 


3-1^  2^  (_l)„^m-fc.pi+n+l 


n=0  m=— n 

with  Af.  defined  by  eq.  (33).  Furthermore,  for  any  p>  1, 


P  3 


3=0 k=-j 


<  (&M)  A')'"'  , 

y  ca  —  a  J  \cj 


(47) 


(48) 


(49) 


Definition  5.2  The  linear  operator  mapping  truncated  multipole  expansion  coefficients  {Oj}, 
j  =  0,...,p  and  k  =  to  local  coefficients  {Lj},  j  =  0,...,p  and  k  = 

according  to  eq.  (48)  will  be  denoted  by  Tml- 


Theorem  5.3  (Translation  of  a  Local  Expansion) 

Let  Q  =  {p,  a,  fi)  be  the  origin  of  a  local  expansion 


«(^’)  =  E  E  o”-yr(«'..^')T'", 


p  n 

E  E 

71=0  m=— n 

where  P  =  (r,  6, 4>)  o,nd  P  —  Q  =  (r',  O',  <(>') .  Then 

P  3 


j=0  A;=— j 


(50) 


(51) 


where 


if=E  E 


n=j  m=— n 


Qm  .  ^.|m|-|m-fc|-|fcl  ,  ^m-k  .  /3)  •  p”"-' 

(_l)n+i.^m  ’ 


(52) 


A*  defined  by  eq.  (33). 


Definition  5.3  The  linear  operator  mapping  old  local  expansion  coefficients  {O^})  n  =  0, . .  .,p 
and  m  =  — n,  ...,n,  to  new  local  expansion  coefficients  {L^},  n  =  0, . .  .,p  and  m  =  — n,  ...,n, 
according  to  eq.  (52)  will  be  denoted  by  Tll- 
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6  The  original  FMM 

We  can  now  construct  a  scheme  with  cost  proportional  to  iV,  by  using  Theorem  5.2  to  convert 
the  far  field  expansion  of  a  source  box  into  a  local  expansion  inside  a  target  box,  rather  than  by 
direct  evaluation  of  the  far  field  expansion  at  individual  target  positions. 

Definition  6.1  With  each  box  i  at  level  I  is  associated  a  local  expansion  about  the  box 
center,  which  describes  the  potential  field  induced  by  all  particles  outside  box  i ’s  near  neighbors. 

Definition  6.2  With  each  box  i  at  level  I  is  associated  a  local  expansion  about  the  box 
center,  which  describes  the  potential  field  induced  by  all  particles  outside  the  near  neighbors  of 
i ’s  parent. 


ALGORITHM 


[Comment  The  parent  of  a  box  j  will  be  denoted  by  p{j).  The  list  of  children  of  a  box  j  will  be 
denoted  by  c{j).  The  interaction  list  of  a  box  j  will  be  denoted  by 

Upward  Pass 

Initialization 

[Comment  Choose  number  of  refinement  levels  n  «  logg  N,  and  the  order  of  the  multipole  expansion 
desired  p.  The  number  of  boxes  at  the  finest  level  is  then  8”,  and  the  average  number  of  particles 
per  box  is  s  =  A^/(8”).] 


Step  1 

[Comment  Form  multipole  expansions  of  potential  field  due  to  particles  in  each  box  about  the 
box  center  at  the  finest  mesh  level,  via  Theorem  3.2.] 

Step  2 


For  levels  /  =  n  —  1, ...,  2, 

Form  multipole  expansion  about  the  center  of  each  box  at  level  I  by 
merging  expansions  from  its  eight  children  via  Theorem  5.1. 

=  J2kechiid{j)  TMM^i+i,k- 


Downward  Pass 

Initialization 

Set  f'l,!  =  '5^1,2  =  •  •  •  =  ^1,8  =  (0, 0,  ...,0). 
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Step  3 


For  levels  1  =  2, n, 

Form  the  expansion  for  each  box  j  at  level  I, 
by  using  Theorem  5.3  to  shift  the  local  expansion 
of  j's  parent  to  j  itself. 

Form  by  using  Theorem  5.2  to  convert  the 
multipole  expansion  of  each  box  k  in  the 
interaction  list  of  box  j  to  a  local  expansion  about 
the  center  of  box  j,  adding  these  local  expansions  together, 
and  adding  the  result  to  ^i,j. 

+  12keiiistij) 

Step  4 

For  each  particle  in  each  box  j  at  the  finest  level  n, 
evaluate  at  the  particle  position. 

Step  5 

For  each  particle  in  each  box  j  at  the  finest  level  n, 

compute  interactions  with  particles  in  near  neighbor  boxes  directly. 


Since  s  is  the  average  number  of  particles  per  box  at  the  finest  level,  there  are  approximately 
N/s  boxes  in  the  tree  hierarchy.  Therefore,  Step  1  requires  approximately  Np^  work.  Step  2 
requires  {N/s)p'^  work.  Step  3  requires  189{N/s)p'*  work.  Step  4  requires  iVp^  work,  and  Step 
5  requires  27 N  s  work.  Thus,  a  reasonable  estimate  for  the  total  operation  count  is 

191  (y)  p^  +  2Np^  +  27Ns.  (53) 

With  s  =  2p^,  the  operation  count  becomes  approximately 

150iVp2,  (54) 

This  would  appear  to  beat  the  estimate  (42)  for  any  N,  but  there  is  a  subtle  catch.  The 
number  of  terms  p  needed  for  a  fixed  precision  in  the  N  log  N  scheme  is  smaller  than  the  number 
of  terms  needed  in  the  FMM  described  above.  To  see  why,  consider  two  interacting  cubes  A 
and  B  of  unit  volume,  with  sources  in  A  and  targets  in  B.  The  worst-case  multipole  error 
decays  like  (\/3/3)^,  since  y/Z/2  is  the  radius  of  the  smallest  sphere  enclosing  cube  A  and  3/2 
is  the  shortest  distance  to  a  target  in  B.  The  conversion  of  a  multipole  expansion  in  ^4  to  a 
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local  expansion  in  -B,  however,  satisfies  an  error  bound  which  depends  on  the  smallest  sphere 
enclosing  B  as  well  as  the  smallest  sphere  enclosing  A.  ^From  eq.  (49),  the  worst  case  error  is 
less  than  (0.76)^  although  with  more  detailed  analysis,  one  can  show  that  the  error  is  bounded 
by  (0.75)^  (Petersen  et  a/.,  1995). 

In  the  original  FMM  (Greengard  and  Rokhlin  1988;  Greengard  1988),  it  was  suggested  that 
one  redefine  the  nearest  neighbor  list  to  include  “second  nearest  neighbors,”  so  that  boxes  which 
interact  via  multi  pole  expansions  are  separated  by  at  least  two  intervening  boxes  of  the  same 
size.  The  error  can  then  be  shown  to  decay  approximately  like  (0.4)p.  However,  the  number  of 
near  neighbors  increases  from  27  to  125  and  the  size  of  the  interaction  list  increases  from  189  to 
875. 

It  is  clear  that  the  major  obstacle  to  achieving  reasonable  efficiency  at  high  precision  is  the 
cost  of  the  multipole  to  local  translations  (189p'^  operations  per  box).  There  are  a  number  of 
schemes  which  have  been  suggested  for  reducing  the  cost  of  applying  translation  operators.  The 
simplest  is  based  on  rotating  the  coordinate  system  so  that  the  vector  connecting  the  source  box 
B  and  the  target  box  C  lies  along  the  z-axis,  shifting  the  expansion  along  the  z-axis,  and  then 
rotating  back  to  the  original  coordinate  system. 

6.1  The  FMM  using  rotation  matrices 
We  begin  with  the  following  obvious  result 

Lemma  6.1  Consider  a  harmonic  function  given  by 

MP)=t  t  (o"+^) 

n=0  m=— n 

where  (r,  d,  (j>)  are  the  spherical  coordinates  of  the  point  P .  If  we  rotate  the  coordinate  system 
through  an  angle  ^  in  the  positive  sense  about  the  z-axis,  then 


where  (r,  9,  <{>')  are  the  new  coordinates  of  P, 

Definition  6.3  Given  a  rotation  angle  j3,  the  diagonal  operator  mapping  old  multipole  coeffi¬ 
cients  to  rotated  multipole  coefficients  (O^  — )■  Olff  )  will  be  denoted  by  'R.zifi)- 

We  also  need  to  be  able  to  rotate  the  coordinate  system  about  the  y-axis. 

Lemma  6.2  Consider  a  harmonic  function  given  by 

n=0m=“n  ' 
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where  (r,  0,  <j>)  are  the  spherical  coordinates  of  the  point  P.  If  we  rotate  the  coordinate  sys¬ 
tem  through  an  angle  a  in  the  positive  sense  about  the  y-axis,  then  there  exist  coefficients 
R{n,  m,  m',  a)  such  that 


$(P)  = 


where  (r,  0,  <f>')  are  the  new  coordinates  of  P, 

Ln'=  R{n,m,m',a)L!^  (55) 

m=—n 


dTtdi 


771 2=-“  71 


(56) 


Proof:  See  (Biedenharn  and  Louck  1981)  for  a  complete  discussion  and  for  a  variety  of  methods 
which  can  be  used  to  compute  the  coeflBcients  R{n,  m,  m',  a) .  □ 


Lemma  6.3  In  order  to  shift  a  multipole  expansion  a  distance  p  along  the  z-axis,  one  can 
replace  eq.  (45)  with  the  simpler  formula 


71=0  3 


(57) 


In  order  to  convert  a  multipole  expansion  centered  at  the  origin  into  a  local  expansion  centered 
at  (0,0,  p),  one  can  replace  eq.  (48)  with  the  simpler  formula 


(68) 


In  order  to  translate  the  center  of  a  local  expansion  from  the  origin  to  the  point  (0, 0,  p),  one 
can  replace  eq.  (52)  with  the  simpler  formula 


‘  ~  hi  (-!)”■"’  ■  ^ 


Definition  6.4  Given  a  rotation  angle  a,  the  diagonal  operator  mapping  old  multipole  coef¬ 
ficients  to  rotated  multipole  coefficients  according  to  formula  (55)  or  (56)  will  be  denoted  by 
Ry^oi).  The  special  cases  of  the  linear  operators  TmM)  Tml>  TLl  which  shift  a  distance  p 
in  the  z-direction  according  to  the  formulae  (57),  (58),  and  (59)  will  be  denoted  by  Tmm{p)> 

Tj^Lip)>  '^Mp)- 
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We  can  now  combine  Lemmas  6.1,  6.2  and  6.3  to  obtain  the  desired  factorizations  of  Tmm, 
TmLi  TlL‘ 

Lemma  6.4 


Tmm  = 

TmL  = 

Tll  =  '^zi-l^)'^yi~0‘)T£i,{p)Tlyio:)TZz{/^)i 

where  (p,  a,  /?)  is  desired  shifting  vector. 

Clearly,  the  cost  of  applying  Tmm^  TmLi  or  Tll  by  means  of  the  preceding  factorization  is 

0(p2)  +  0{p^)  +  0{p^)  +  O(p^)  H-  0(p^). 

Thus,  the  total  computational  cost  of  the  FMM  can  be  reduced  to  approximately 

191  +  2Np^  +  21Ns. 

With  s  =  3p®/^,  the  operation  count  becomes 

270iVp^/^  +  2iVp^.  (60) 

7  Mathematical  Preliminaries  III 

Over  the  last  few  years,  a  number  of  “fast”  or  diagonal  translation  schemes  have  been  developed 
which  require  0(p^)  work  (Greengard  and  Rokhlin  1988b;  Berman  1995;  Elliott  and  Board  1996). 
Unfortunately,  these  schemes  are  all  subject  to  certain  numerical  instabilities.  The  instabilities 
can  be  overcome,  but  at  additional  cost,  the  details  of  which  we  leave  to  the  cited  papers. 

The  latest  generation  of  fast  algorithms  is  based  on  combining  multipole  expansions  with 
exponential  or  “plane  wave”  expansions.  The  reason  for  using  exponentials  is  that  translation 
corresponds  to  multiplication  and,  like  the  earlier  fast  schemes,  requires  only  0(p^)  work.  Un¬ 
like  in  the  earlier  diagonal  schemes,  however,  no  numerical  instabilities  are  encountered.  The 
two-dimensional  theory  is  described  in  (Hrycak  and  Rokhlin  1995),  and  we  present  the  three- 
dimensional  theory  here. 

Rem2trk  7.1  A  complicating  feature  of  the  new  approach  is  that  six  plane  wave  expansions  will 
be  associated  with  each  box,  one  emanating  from  each  face  of  the  cube.  To  fix  notation,  we  will 
refer  to  the  +z  direction  as  up,  to  the  — z  direction  as  down,  to  the  +y  direction  as  north,  to 
the  -y  direction  as  south,  to  the  +x  direction  as  east,  and  to  the  -x  direction  as  west.  The 
interaction  list  for  each  box  will  be  subdivided  into  six  lists,  one  associated  with  each  direction. 
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Figure  6:  The  Uplist  for  the  box  B  (see  Definition  7.1). 

Definition  7.1  The  Uplist  for  a  box  B  consists  of  those  elements  of  the  interaction  list  which 
lie  above  B  and  are  separated  by  at  least  one  box  in  the  +z-direction  (Fig,  6),  The  Downlist  for 
a  box  B  consists  of  those  elements  of  the  interaction  list  which  lie  below  B  and  are  separated 
by  at  least  one  box  in  the  --z- direction.  The  Northlist  for  a  box  B  consists  of  those  elements  of 
the  interaction  list  which  lie  north  of  B,  are  separated  by  at  least  one  box  in  the  +y~directionf 
and  are  not  contained  in  the  Up  or  Down  lists.  The  Southlist  for  a  box  B  consists  of  those 
elements  of  the  interaction  list  which  lie  south  of  B,  are  separated  by  at  least  one  box  in  the 
^y^direction,  and  are  not  contained  in  the  Up  or  Down  lists.  The  Eastlist  for  a  box  B  consists 
of  those  elements  of  the  interaction  list  which  lie  east  of  B,  are  separated  by  at  least  one  box  in 
the  +x-direction,  and  are  not  contained  in  the  Up,  Down,  North,  or  South  lists.  The  Westlist 
for  a  box  B  consists  of  those  elements  of  the  interaction  list  which  lie  west  of  B,  are  separated 
by  at  least  one  box  in  the  ^x-direction^  and  are  not  contained  in  the  Up,  Down,  North,  or  South 
lists. 

It  is  easy  to  verify  that  the  original  interaction  list  is  equal  to  the  union  of  the  Up,  Down, 
North,  South,  East  and  West  lists.  It  is  also  easy  to  verify  that 

C  €  Uplist {B)  ^  B  e  Downli$t{C) 

C  e  Northlist{B)  ^  B  £  Southlist{C)  (61) 

C  G  Eastlist{B)  BE  Westlist{C) 

Given  a  source  location  P  =  (xq,  yo?  ^o)  ^  target  location  Q  =  (x,  y,  z),  our  starting  point 

is  the  well-known  integral  representation  (Morse  and  Feshbach  1953,  p.  1256) 

_ 1 _ 

y/{x  -  xoy  +  (y  -  yo)^  +  {^- 

_  J__  A(2~2o  )  gt A((x-xo)  cos  a+{y-yo  )  sin  ^g2) 

27r  Jo  Jo  , 
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valid  for  z  >  zq. 

To  get  a  discrete  representation,  we  must  use  an  appropriate  quadrature  formula.  The  inner 
integral,  with  respect  to  a,  is  easily  handled  by  the  trapezoidal  rule  (which  achieves  spectral 
accuracy  for  periodic  functions),  but  the  outer  integral  requires  more  care.  Laguerre  quadrature 
is  an  appropriate  choice  here,  but  even  better  performance  can  be  obtained  using  generalized 
Gaussian  quadrature  rules  (Yarvin  and  Rokhlin  1996).  These  have  been  designed  with  the 
geometry  of  the  interaction  list  in  mind. 

Because  of  the  restriction  that  z  >  zo,v/e  will  assume,  for  the  moment,  that  the  source  P  is 
contained  in  a  box  B  and  that  the  target  Q  lies  in  a  box  C  €  Uplist{B).  The  following  lemma 
describes  several  discrete  approximations  of  the  double  integral  in  (62)  as  double  sums. 

Lemma  7.1  Let  P  e  B  and  Q  e  C  &  Uplist{B),  where  B  is  a  box  of  unit  volume.  Then 


1^: _ ■'p  --A*[(z-2o)-i(a;-xo)cosaj-(y-yo)sinaj]|  ^  (63) 

'rpg 

where  otj  —  2irj /M {k),  and  the  weights  wi, . . . ,  wq,  nodes  Ax, ... ,  Ag,  and  values  A/(l), . . . ,  A/ (9) 
are  given  in  section  12,  Table  1.  (The  total  number  of  exponentials  required  is  109.j 

18  Af(fc) 

|_J; _ ^  p-Afc[(2-2o)-t(®-®o)cosaj-(3/-»o)smaj]|  ^  j^q-6 

'rPQ'tM{k)k 

where  aj  =  2vj  /M  (k),  and  the  weights  wi,...,  wis,  nodes  Ax , . . . ,  Axs,  and  values  M(l) , . . . ,  M(18) 
are  given  in  section  12,  Table  2.  (The  total  number  of  exponentials  required  is  5b8.) 


1 

rpQ 


^  ^  -Ak[(2-zo)-t(®-®o)cosaj-(»-j/o)sinot_,]|  <-95. 10“^°, 

k  k 


(65) 


whereoj  =  2Trj/M{k),  and  the  weights  Wi,...,W3o,  nodes  Ax,...,  A30,  and  values  M(l), ..  .,M(30) 
are  given  in  section  12,  Table  S.  (The  total  number  of  exponentials  required  is  \lbl.) 


Remark  7.2  The  formulae  (63)-(65)  are  somewhat  complex,  but  have  a  simple  interpretation. 
The  outer  sums  use  the  generalized  Gaussian  weights  and  nodes  {tv^.  A*}  obtained  in  (Yarvin 
and  Rokhlin  1996)  to  approximate  the  outer  integral  (with  respect  to  A),  while  the  inner  sums 
use  the  trapezoidal  rule  to  approximate  the  inner  integral  (with  respect  to  a).  The  number  of 
nodes  in  each  inner  integral  depends  on  the  value  A^  for  which  the  integration  is  being  performed, 
and  is  denoted  by  M{k).  These  are  derived  from  standard  estimates  concerning  Bessel  functions 
(Watson  1944,  pp.  227,  255;  Rokhlin  1995). 


Remark  7.3  In  the  remainder  of  this  paper,  we  will  assume  that  the  desired  precision  e  is  clear 
from  the  context  and  will  write 


1 

rpQ 


s(£)  M(fc) 


-EE 


^~Afe(z-zo )  ^tAfc((g-go)  cos  aj+{y-yo )  sin  aj)\ 


<£, 


where  Oj  =  27r j/M{k).  This  is  a  mild  abuse  of  notation,  since  the  weights,  nodes  and  values 
M{k)  depend  on  e  as  well.  The  total  number  of  exponential  basis  functions  used  will  be  denoted 
by  Sqxp^  that 

«(£) 

ik=l 


Corollary  7.1  Let  B  be  a  box  of  unit  volume  centered  at  the  origin  containing  N  charges  of 
strengths  {?/,  I  =  located  at  the  points  {Qi  =  (xi,yi,zi),  I  =  Then  for  any 

P  contained  in  Uplist[B),  the  potential  satisfies 


s{e)M(k) 

l^(^)  “EE  <  Ae,  (67) 

k-l  i=:l 


where  A  =  YliLi  \qi\ 


N 


/=i 


(68) 


Corollary  7*2  (Diagonal  translation)  Let  B  be  a  box  of  unit  volume  centered  at  the  origin 
containing  N  charges  of  strengths  {9/,  I  =  1,..,,  iV},  located  at  the  points  {Qi  =  {^uyu^i)’,  I  = 
1,  and  let  C  be  a  box  in  Uplist{B)  centered  at  (xi, yi,zi).  For  P  eC,  let  the  potential 

$(P)  be  approximated  by  the  exponential  expansion  centered  at  the  origin 

s(ff)  M{k) 

$(P)  =  2  ^  (gg) 

A;=:l  j=l 


Then 


where 


s{e)  M(k) 
tel  i=i 

V{kJ)  =  W{kJ)  cosa,+yi  sinaj) 


(70) 

(71) 


Definition  7.2  The  diagonal  operator  mapping  the  original  set  of  exponential  expansion  coeffi¬ 
cients  {W^(A:,  j)}  to  the  shifted  exponential  expansion  coefficients  {V{k,j)}  according  to  eg.  (71) 
will  be  denoted  by  where  BC  =  (xi,  yi ,  Zi)  is  the  vector  from  the  center  of  B  to  the  center 
ofC. 
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In  the  FMM,  we  will  be  given  the  multipole  expansion  of  a  charge  distribution  for  a  box  B 
rather  than  the  charge  distribution  itself,  and  will  need  to  convert  it  to  an  exponential  expansion. 
This  is  accomplished  by  the  following  theorem. 

Theorem  7.1  Let  B  be  a  box  of  unit  volume  centered  at  the  origin  containing  N  charges  of 
strengths  {qi,  I  =  located  at  the  points  {Qi  =  ixi,yi,zi),  I  =  1,  ...,iV}.  Let  P  £  C  e 

Uplist{B)  and  suppose  that  the  potential  #(F)  is  given  as  the  multipole  expansion 

^iP)  =  E  "E  P2) 


Then 


5(e)  M(fc) 

|$(P)  -  2  <  Ae, 

k=i  i=i 


where  A  =  J2iLi  k/i 


oo  oo 


Mr 


MW 


v/(n-m)!(n  +  m)! 


(73) 


(74) 


Proof;  The  formula  (74)  follows  from  the  definitions  (30)  (31)  and  (32).  The  estimate  (73) 
follows  from  Corollary  7.1. 


Definition  7.3  The  linear  operator  mapping  a  finite  multipole  expansion  {M^},  n  =  0,...,p 
andm  =  -n,  ...,n,to  the  corresponding  set  of  coefficients  in  an  exponential  expansion  {W{k,j)} 
according  to  eq.  (74)  will  be  denoted  byCMX- 

Once  the  multipole  expansion  for  a  source  box  has  been  converted  into  an  exponential  ex¬ 
pansion  (via  Theorem  7.1)  and  translated  to  a  target  box  center  (via  Corollary  7.2),  we  will  need 
to  convert  the  exponential  expansion  back  into  a  solid  harmonic  series.  The  following  theorem 
provides  the  necessary  machinery. 

Theorem  7.2  Let  B  be  a  box  of  unit  volume  containing  N  charges  of  strengths  {qi,  1  =  1, ...,  N}, 
located  at  the  points  {Qi  =  (x/,  yi,  z/),  I  =  1, ...,  N).  Let  P  be  contained  in  aboxC  £  Uplist{B), 
centered  at  the  origin,  and  suppose  that  the  potential  ^{P)  is  given  as  the  exponential  expansion 

5(e)  M(k) 

$(P)  -  2  (75) 

it=l  j=l 


where  A  =  IqiI-  Then 


IW-E  E  L^-Y)r{0,<l>)-r^\<Ae, 

n=0  m^—n 


(76) 
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where 


(77) 


rm  _ 


Proof:  Eq.  (77)  follows  easily  from  the  formula  (Hobson  1955,  p.  123) 

(z  +  ix  cosa  +  iy  sin  a)"  = 

r"  I  F„(cos^)  +2J2  (O'”"/  ■\-i(-l)’"^r(cos^)  cosm((f>  -  a)  1 , 

I  m=i  (n  +  m).  j 

where  (r,  ff,  </>)  are  the  spherical  coordinates  of  the  point  with  Cartesian  coordinates  (x,  y,  z).  □ 

Definition  7.4  The  linear  operator  mapping  the  set  of  coefficients  in  an  exponential  expansion 
to  the  coefficients  in  the  corresponding  truncated  solid  harmonic  expansion  {L^}^ 
n  =  0, . .  .,p  and  m  =  -n, . . . ,  n,  according  to  eq.  (77)  will  be  denoted  by  CxL> 


Remark  7.4  Theorems  7.1  and  7.2,  like  Theorem  5.2,  are  not  quite  the  right  tools  needed  to 
obtain  rigorous  error  estimates  for  the  FMM.  In  both  cases,  we  have  ignored  the  fact  that  the 
multipole  and  local  expansions  are  truncated.  It  is  straightforward  but  tedious  to  derive  pre¬ 
cise  estimates,  and  we  ignore  this  issue  in  the  present  paper.  We  should  note  that  the  nature 
of  such  estimates  depends  on  how  the  multipole- to-exponential,  multipole-to-solid  harmonic  or 
exponential-to-solid  harmonic  conversion  is  carried  out.  Formulae  (74),  (77)  and  (48)  are  the 
easiest  to  derive,  being  the  Taylor  expansions  of  the  potential  $.  However,  each  of  these  con¬ 
versions  is  simply  a  linear  mapping  from  one  set  of  basis  functions  to  another.  The  formulae 
(77),  (74),  and  (48)  can  be  shown  to  correspond  to  minimizing  the  L2  error  on  the  surface  of 
a  sphere  enclosing  the  given  source  or  target  box.  One  could  choose  a  variety  of  other  possible 
projections,  such  as  minimizing  the  L2  or  Lqo  error  on  the  surface  of  the  corresponding  box  itself. 

Remark  7.5  By  inspection  of  formula  (74),  it  is  clear  that  the  cost  of  applying  the  operator 
Tmx  is  p^  s{s)  +pSexp-  The  same  is  true  for  the  operator  Txl-  If  is  also  worth  noting  that  fast 
Fourier  transforms  can  be  used  to  reduce  the  cost  of  the  outer  sum  in  the  truncated  version  of 
formula  (74)  and  the  inner  sum  in  the  truncated  version  of  formula  (77). 

Corollary  7.3  (Multipole  to  local  factorization)  Let  B  be  a  box  of  unit  volume  and  C  a 
box  in  Uplist{B).  IfTML  Ihe  translation  operator  converting  the  multipole  expansion  centered 
in  B  to  the  local  expansion  centered  in  C,  then 

Tml  =  CxL  •  (78) 

Remark  7.6  It  is  important  to  note  that  Lemma  7.1  provides  a  carefully  designed  quadrature 
formula  which  assumes  that  the  source  box  B  has  unit  volume  and  that  the  target  is  in  JB’s 
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Uplist.  In  order  to  use  these  quadrature  weights  and  nodes,  we  need  to  rescale  the  multipole 
and  local  expansions  so  that  the  box  dimension  always  has  unit  volume.  To  accomplish  this,  if 

n=Om=— n 

is  the  multipole  expansion  for  a  box  B  of  volume  we  simply  write 

n=Om=— n  V  /  / 

The  local  expansion  for  a  target  box  in  JS’s  interaction  list  is  accumulated  as 

j=Ok=-j  ^  ^ 

Corollary  7.4  (Scaled  multipole  to  local  factorization)  Let  B  be  a  box  of  volume  and  C 
a  box  in  Uplist{B),  with  the  vector  from  the  center  of  B  to  the  center  of  C  given  by  (xi,yi,«i). 
If  is  the  translation  operator  converting  the  multipole  expansion  centered  in  B  to  the  local 
expansion  centered  in  C,  then 

TmL  =  'L>d,L  CxlL>^CmX  (®2) 


where 

and^  =  BlJId. 

The  cost  of  a  single  multipole-to-local  translation  using  the  factorization  of  Corollary  7.4  is 

2p^  +  2p^s(e)  +  2pSexp  «  2p^) 

since  s  «  p  and  S^xp  «  If  each  translation  were  carried  out  in  this  manner,  we  would 
not  improve  on  the  rotation  based  scheme  discussed  in  section  6.1.  However,  one  the  multipole 
expansion  for  a  box  B  has  been  converted  to  an  exponential  expansion  (via  the  application  of 
Vd,M  and  Cmx),  it  can  be  translated  to  each  box  in  its  Uplist  at  a  cost  of  Sexp  »  P^  operations. 
Conversely,  once  a  box  B  has  accumulated  all  the  exponential  expansions  transmitted  from  its 
Downlist  (see  eq.  (61)),  a  single  application  of  the  operators  CxL  and  Vd,L  yields  the  local 
harmonic  expansion  describing  the  field  due  to  the  sources  in  the  Downlist  of  box  B  (Fig.  7). 

Up  to  this  point,  we  have  considered  only  the  exponential  representation  needed  to  shift 
information  in  the  upward  {+z)  direction.  As  noted  in  the  beginning  of  this  section,  however, 
there  are  six  outgoing  directions  which  need  to  be  accounted  for.  The  most  straightforward  way 
of  generating  the  appropriate  expansions  is  to  rotate  the  coordinate  system  so  that  the  i;-axis 
points  in  the  desired  direction.  The  following  lemma  provides  the  necessary  formulae. 
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Multipole  Rep.  Local  Rep. 


Exponential  Rep.  Exponential  Rep. 

Figure  7:  In  the  new  FMM,  a  large  number  of  multipole-to-local  translations,  costing  0{p^)  or 
O(p^)  work,  can  be  replaced  by  a  large  number  of  exponential  translations,  costing  0{p^)  work. 


Lemma  7.2  Let  B  be  a  box  of  volume  cP  and  C  a  “target”  box.  Let  Tml  translation 

operator  converting  the  multipole  expansion  centered  in  B  to  the  local  expansion  centered  in  C . 
IfC£  Downlist{B),  then 

^  CxlDbcCmX  Tlyix)  Vi,M- 

If  C  ^  Eastlist{B) ,  then 

j'East  _  (2)  CxlT^^Cmx  ^»(’’'/2)  L>d,M- 

If  C  £Westlist{B),  then 

—  I^d.L  i'XL'i^'BC^^X  'Bd.M- 

IfC£  Northlist(B)s,  then 

j-North  ^  Ty^^nyi-x/2)  n,{-7rf2)CxLT>BcCMX  nyin/2)  n,{x/2)  Vd,M- 

If  C  ^  Southlist{B),  then 

J-Sc^th  ^  2?^^ ny{xf2)  nz{-xl2)  CxlI>wCmx  ^y(-7r/2)  Tl,{x/2)  Vd,M. 

where  BC  is  the  appropriately  scaled  vector  from  the  center  of  B  to  the  center  of  C  in  the  rotated 
coordinate  system.  The  operators  IZz  o>Tid  TZy  are  defined  in  section  6.1. 

Definition  7*5  Let  T^i  be  given  by  the  operator  Tml  defined  in  eq.  (82) •  Then,  for 
Dir  e  {Up,  Down,  East,  West,  North,  South},  we  will  write 
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so  that 


=  Vi^LCxu 

=  Cmx  T>d,M^ 

opown  ^  Vd,Lnyi-w)CxL, 
=  CMxny{ir)Vd,M. 


etc. 

We  are  now  in  a  position  to  describe  the  new  FMM  in  detail. 


8  The  New  FMM 


ALGORITHM 


[Comment  The  parent  of  a  box  j  will  be  denoted  by  p{j)-  The  list  of  children  of  a  box  j  will  be 
denoted  by  c{j).  For  each  box  j,  the  “outgoing"  exponential  expansion  with  coefficients  {W{n,m)}, 
n  =  1, . . . ,  s(e);  m  =  1, . . . ,  M(n),  will  be  denoted  by  Wj.  We  will  also  associate  an  “incoming" 
exponential  expansion  with  each  box,  denoted  by  Vj\ 

Upward  Pass 

Initialization 

[Comment  Choose  number  of  refinement  levels  n  «  logg  N ,  and  the  order  of  the  multipole  expansion 
desired  p.  The  number  of  boxes  at  the  finest  level  is  then  8”,  and  the  average  number  of  particles 
per  box  is  s  =  iV/(8").] 

Step  1 

Form  multipole  expansions  $n,t  of  potential  field  due  to  particles  in  each  box  about  the  box  center 
at  the  finest  mesh  level,  via  Theorem  3.2. 


Step  2 


Do  for  levels  I  =  n—  1, 2, 

Form  multipole  expansion  about  the  center  of  each  box  at  level  I  by 
merging  expansions  from  its  eight  children  via  Theorem  5.1. 

=  J2k€child{j)  TMM^l+i,k. 

(In  applying  Tmm>  use  the  factorization  of  Lemma  6.4.) 

End  do 
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Downward  Pass 


Initialization 

Set  =  'i'1,2  =  •  •  •  =  ^1,8  =  (0)  0, 0). 

Step  SA 

Do  for  levels  /  =  2, n. 

Form  the  expansion  for  each  box  j  at  level  /  by  using  Theorem  5.3  to 
shift  the  local  ^  expansion  of  j's  parent  to  j  itself. 

%j  =  TLL^i-i,p(j). 

(In  applying  Tll,  use  the  factorization  of  Lemma  6.4.) 

Set 

Step  SB 

[Comment  For  each  direction  Dir  =  Up,  Down,  North,  South,  East,  West,  the  opposite  direction 
will  be  denoted  by  —Dir,  so  that  —Up  —  Down,  — Down  =  Up,  etc.  Thus,  if  a  box  B  sends  an 
outgoing  expansion  in  direction  Dir  to  Box  C  on  its  Dirlist,  then  C  can  be  viewed  as  receiving  the 
expansion  from  B  which  is  an  element  of  its  —Dirlist.  (see  eq.  (61)).] 

Do  for  Dir  =  Up,  Down,  North,  South,  East,  West, 

For  each  box  j  at  level  I,  convert  the  multipole  expansion 
into  the  "outgoing”  exponential  expansion  for  direction  Dir. 

Wj  = 

For  each  box  j  at  level  I,  collect  the  “outgoing”  exponential 
expansions  from  the  —Dirlist  of  box  j  as  an  “incoming” 
exponential  expansion 

—  Ylke-Dirlist^kj 

where  kj  is  the  appropriately  scaled  vector  from  the  center  of  box  k  to 
the  center  of  box  j  in  the  rotated  coordinate  system . 

For  each  box  j  at  level  I,  convert  the  accumulated  “incoming”  exponential 
expansion  Vj  into  a  local  harmonic  expansion  and  add  result  to 

End  do 
End  do 


Step  4 

For  each  particle  in  each  box  j  at  the  finest  level  n, 
evaluate  at  the  particle  position. 
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Step  5 


For  each  particle  in  each  box  j  at  the  finest  level  n, 

compute  interactions  with  particles  in  near  neighbor  boxes  directly. 


Since  we  are  using  the  rotation  scheme  for  applying  7mm  7ll  ii^  Steps  2  and  3 A, 
these  now  require  a  total  of  (^N/s)  work,  where  s  is  the  number  of  particles  per  box  on  the 
finest  level.  In  Step  3B,  the  applications  of  the  multipole  to  exponential  operators  and  the 
exponential-to-local-operators  QP*’'  require  a  total  of  approximately  6p^(iV/s)  work,  while  the 
exponential  translations  require  approximately  189  {N /s)  work.  The  total  operation  count  is 
therefore  of  the  order  ^ 

m  —  p^  +  2Np'^  +  27Ns  +  Q—p^. 

S  5 

With  s  =  2p,  the  total  operation  count  is  about 

l50Np  +  5Np‘^. 


8.1  Current  improvements 

There  are  a  number  of  ways  in  which  the  algorithm  described  above  has  been  accelerated.  Sym¬ 
metry  considerations,  for  example,  allow  the  pairs  of  operators  ^'pNorth^  pSouth'^^ 

and  pWest-^  jjg  applied  simultaneously.  The  same  is  true  for  the  adjoint  pairs 

etc.  Thus,  the  6p^{N/s)  work  needed  in  Step  3B  can  be  replaced  by  Zp^{N/s)  work. 

Even  more  significant  is  the  fact  that  the  number  of  translations  per  box  can  be  reduced  from 
189  to  less  than  40.  To  see  why,  suppose  that  a  box  B  at  level  I  has  eight  children,  denoted 
Si , . . . ,  Bg,  and  that  boxes  Ci , . . . ,  Cj  lie  in  the  Uplist  of  each  child.  In  the  new  FMM  described 
above,  we  accumulated  an  “incoming”  exponential  expansion  in  each  box  Cj  as 

k=l 

where  Wk  is  the  “outgoing”  exponential  expansion  for  Bk-  Repeating  this  for  y  =  1, ...,J 
requires  a  total  of  8  J  translations.  Since  all  translations  are  diagonal,  however,  it  is  easy  to 
verify  that 

8 

Vj  = 

k=l 

*=1 

Thus,  by  first  merging  the  “outgoing”  expansions,  and  then  translating  their  sum  to  each  target 
box  Cj,  only  8+J  translations  are  needed.  It  should  be  emphasized  that  this  improvement  relies 
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on  the  diagonal  form  of  the  operators.  One  could  try  to  merge  expansions  in  this  manner  in  the 
context  of  the  original  FMM,  but  the  local  expansion  coefficients  computed  with  and  without 
merging  would  not  be  the  same.  There  would  be  a  significant  loss  of  precision,  consistent  with 
the  error  bound  (49). 

8.2  Further  improvements 

There  are  several  ways  in  which  the  scheme  can  be  accelerated  which  have  not  been  incorporated 
into  the  existing  code.  The  most  significant  of  these  is  probably  a  change  in  the  choice  of  the 
translation  operators  Tmm  and  7ll,  as  well  as  the  multipole-to-exponential  and  exponential-to- 
local  conversion  operators  Cjv/x  and  Cxl-  As  mentioned  previously,  the  obvious  formulae  (45), 
(52),  (74),  and  (77)  are  obtained  via  Taylor  expansion  and  are  clearly  not  optimal.  Preliminary 
numerical  experiments  indicate  that  replacing  them  with  more  carefully  optimized  tools  will 
reduce  the  cost  of  these  calculations  within  the  FMM  by  a  factor  of  three.  Furthermore,  the 
improvement  described  in  Remark  7.5  has  not  yet  been  implemented;  we  are  using  the  explicit 
matrix  form  of  the  discrete  Fourier  transform  in  applying  Cmx  and  CxL^  rather  than  the  FFT. 

The  incorporation  of  all  these  modifications  is  likely  to  reduce  the  overall  cost  by  a  factor  of 
two. 


9  Numerical  Results 

The  new  FMM  has  been  implemented  in  Fortran  77  and  tested  on  uniform  random  distributions. 
The  results  of  our  experiments  are  summarized  in  Tables  1-4,  with  all  times  calculated  in  seconds 
using  a  Sun  Ultra-1/140  workstation.  In  each  table,  the  first  column  lists  the  number  of  particles, 
the  second  column  lists  the  number  of  levels  used  in  the  multipole  hierarchy,  the  third  column 
lists  the  order  of  the  multipole  expansion  used,  and  the  fourth  column  lists  the  corresponding 
number  of  exponential  basis  functions.  Columns  five  and  six  indicate  the  times  required  by  the 
FMM  and  the  direct  calculation,  respectively,  and  column  seven  lists  the  P  norm  of  the  error  in 
the  FMM  approximation 

p  (tL,  i»fe)  -  *(x.)in 

V  1:2:.  ) 

For  the  largest  simulations,  with  N  >  10000,  we  have  carried  out  the  direct  calculation  on  a 
subset  of  only  100  particles.  The  stated  times,  indicated  in  parentheses,  are  then  computed  by 
extrapolation  and  the  errors  are  obtained  by  restricting  the  formula  (83)  to  this  subset. 

10  Extensions  and  Generalizations 

The  scheme  presented  in  this  paper  is  not  adaptive  and  assumes  that  the  distribution  of  points 
is  reasonably  uniform  in  space.  In  order  to  handle  more  general  distributions,  one  needs  to 
allow  some  regions  to  be  subdivided  to  finer  refinement  levels  than  others.  Adaptive  structures 
of  this  type  have  been  designed  by  several  groups  (Carrier  et  ah  1988;  Van  Dommelen  and 


1/2 

(83) 
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Table  1:  Timing  results  for  the  FMM  using  fifth  order  expansions  and  twenty-eight  exponential 
basis  functions. 


N 

Levels 

P 

Sexp 

Tfmm 

Tdir 

Error 

500 

3 

5 

28 

0.18 

0.20 

4.5  •  10"^ 

5000 

4 

5 

28 

1.9 

20.1 

7.6  •  10-3 

40000 

5 

5 

28 

20 

(1461) 

7.0  •  10-3 

300000 

6 

5 

28 

175 

(82475) 

1.3  •  10-2 

Table  2:  Timing  results  for  the  FMM  using  ninth  order  expansions  and  109  exponential  basis 
functions. 


N 

Levels 

P 

Sexp 

Tfmm 

Tdir 

Error 

2000 

3 

9 

109 

1.4 

3.37 

1.4-10-^ 

10000 

4 

9 

109 

7.9 

83 

3.6  •  lO-'* 

80000 

5 

9 

109 

111 

(5838) 

4.1  •  10-“* 

Table  3t  Timing  results  for  the  FMM  using  eighteenth  order  expansions  and  558  exponential 
basis  functions. 


N 

Levels 

P 

Sexp 

Tfmm 

Tdir 

Error 

4000 

3 

18 

558 

8.3 

13.4 

1.1  •  10-^ 

25000 

4 

18 

558 

68 

(567) 

1.5  •  10-’’ 

150000 

5 

18 

558 

495 

(20100) 

1.9  •  lO-’’ 

Table  4:  Timing  results  for  the  FMM  using  thirtieth  order  expansions  and  1751  exponential  basis 
functions. 


N 

Levels 

P 

Sexp 

Tfmm 

Tdir 

Error 

5000 

3 

30 

1751 

22 

20.8 

6.2  •  10-^2 

50000 

4 

30 

1751 

316 

(2280) 

6.2  •  10-^2 
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Rundensteiner  1989;  Nabors  et  al.  1994)  and  we  are  in  the  process  of  incorporating  these 
structures  into  the  new  FMM. 

While  a  number  of  techniques  now  exist  for  high  frequency  scattering  problems  (  Rokhlin 
1988,  1990,  1993;  Canning  1989,  1992,  1993;  Coifman  and  Meyer  1991;  Bradie  et  al.  1993; 
Coifman  et  al.  1993,  1994;  Wagner  and  Chew  1994;  Epton  and  Dembart  1995),  an  important 
generalization  of  the  algorithm  of  this  paper  is  to  the  calculation  of  potentials  governed  by  the 
Helmholtz  equation  at  low  frequency.  By  this  we  mean  an  environment  in  which  the  region  of 
interest  is  no  more  than  a  few  wavelengths  in  size,  but  contains  a  large  number  of  discretization 
points  (for  example,  due  to  the  complexity  of  some  structure  being  modeled).  Algorithms  for 
such  problems  are  currently  being  designed. 

11  Conclusions 

A  new  version  of  the  FMM  has  been  developed.  It  is  based  on  a  new  diagonal  form  for  transla¬ 
tion  operators,  and  is  significantly  faster  than  previous  implementations  at  any  desired  level  of 
precision.  Of  particular  interest  is  the  fact  that  high  precision  calculations  have  been  brought 
within  practical  reach. 


12  Tables:  Quadrature  weights  and  nodes 


Table  5:  Columns  1  and  2  contain  the  nine  weights  and  nodes  needed  for  discretization  of  the 
outer  integral  in  (62)  at  three  digit  accuracy.  Column  3  contains  the  number  of  discretization 
points  needed  in  the  inner  integral,  which  we  denote  by  M{k). 


Node 

Weight 

M{k) 

0.09927399673971 

0.24776441819008 

4 

0.47725674637049 

0.49188566500464 

7 

1.05533661382183 

0.65378749137677 

11 

1.76759343354008 

0.76433038408784 

15 

2.57342629351471 

0.84376180565628 

20 

3.44824339201583 

0.90445883985098 

20 

4.37680983554726 

0.95378613136833 

24 

5.34895757205460 

0.99670261613218 

7 

6.35765785313375 

1.10429422730252 

■n 
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Table  6:  Columns  1  and  2  contain  the  eighteen  weights  and  nodes  for  discretization  of  the  outer 
integral  in  (62)  at  six  digit  accuracy.  Column  3  contains  the  number  of  discretization  points 
needed  in  the  inner  integral,  which  we  denote  by  M{k). 


Node 

Weight 

M(k) 

0.05278852766117 

0.13438265914335 

5 

0.26949859838931 

0.29457752727395 

8 

0.63220353174689 

0.42607819361148 

12 

1.11307564277608 

0.53189220776549 

16 

1.68939496140213 

0.61787306245538 

20 

2.34376200469530 

0.68863156078905 

25 

3.06269982907806 

0.74749099381426 

29 

3.83562941265296 

0.79699192718599 

34 

4.65424734321562 

0.83917454386997 

38 

5.51209386593581 

0.87570092283745 

43 

6.40421268377278 

0.90792943590067 

47 

7.32688001906175 

0.93698393742461 

51 

8.27740099258238 

0.96382546688788 

56 

9.25397180602489 

0.98932985769673 

59 

10.25560272374640 

1.01438284597917 

59 

11.28208829787774 

1.04003654374165 

51 

12.33406790967692 

1.06815489269567 

4 

13.41492024017240 

1.10907580975537 

1 
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Table  7:  Columns  1  and  2  contain  the  thirty  weights  and  nodes  for  discretization  of  the  outer 
integral  in  (62)  at  ten  digit  accuracy.  Column  3  contains  the  number  of  discretization  points 
needed  in  the  inner  integral,  which  we  denote  by  M(k). 


Node 

Weight 

M{k) 

0.03239542384523 

0.08289159611006 

7 

0.16861844033714 

0.18838810673274 

10 

0.40611377169029 

0.28485143005306 

14 

0.73466473057596 

0.37041553715895 

18 

1.14340561998398 

0.44539043894975 

22 

1.62232408412252 

0.51100452150290 

26 

2.16276138867422 

0.56865283856139 

30 

2.75739199003682 

0.61958013174010 

35 

3.40002470112078 

0.66481004321965 

39 

4.08539104793552 

0.70517204769960 

43 

4.80897515497095 

0.74134967169016 

48 

5.56688915983444 

0.77392103530415 

53 

6.35578243654166 

0.80338600122756 

57 

7.17277232990713 

0.83018277269650 

62 

8.01538803542112 

0.85469824839953 

66 

8.88152313049502 

0.87727539085565 

71 

9.76939480982937 

0.89821948245755 

76 

10.67750922034750 

0.91780416582368 

80 

11.60463289992789 

0.93627766216629 

85 

12.54977061299652 

0.95386940504388 

89 

13.51215012257297 

0.97079739700556 

94 

14.49121482655196 

0.98727684670885 

97 

15.48662587630224 

1.00353112433459 

103 

16.49827659770404 

1.01980697905712 

107 

17.52632405530625 

1.03639774457222 

110 

18.57124579700721 

1.05368191266322 

112 

19.63393428118300 

1.07219343903929 

108 

20.71585163675095 

1.09278318162014 

84 

21.81939113866225 

1.11737373706779 

4 

22.95080495008893 

1.15786184931141 

1 
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